on Power - Efficient Fault Tolerant Micro architecture for Chip Multiprocessors
نویسندگان
چکیده
Relentless scaling of silicon fabrication technology coupled with lower design tolerances are making ICs increasing susceptible to wear-out related permanent faults as well as transient faults. A well known technique for tackling both transient and permanent faults is redundant execution, specifically space redundancy, wherein a program is executed redundantly on different processors, pipelines or functional units and the results are compared to detect faults. In this paper, we describe a power-efficient architecture for redundant execution on chip multiprocessors (CMPs) which when coupled with our per-core dynamic voltage and frequency scaling (DVFS)algorithm significantly reduces the power overhead of redundant execution without sacrificing performance. Using cycle accurate simulation combined with an architectural power model we estimate that our architecture reduces dynamic power dissipation in the redundant core by an mean value of 76% with an associated mean performance overhead of only 1.2%. We also present an extension to our architecture that enables the use of cores with faulty functional units for redundant execution without a reduction in transient fault coverage. This extension enables the usage of faulty cores, thereby increasing yield and reliability with only a modest power-performance penalty over fault-free execution. Dhirubhai Ambani Institute of Information and Communication Technology Gandhinagar
منابع مشابه
Power Efficient Redundant Execution for Chip Multiprocessors
This paper describes the design of a power efficient microarchitecture for transient fault detection in chip multiprocessors (CMPs) We introduce a new per-core dynamic voltage and frequency scaling (DVFS) algorithm for our architecture that significantly reduces power dissipation for redundant execution with a minimal performance overhead. Using cycle accurate simulation combined with a simple ...
متن کاملCAFT: Cost-aware and Fault-tolerant routing algorithm in 2D mesh Network-on-Chip
By increasing, the complexity of chips and the need to integrating more components into a chip has made network –on- chip known as an important infrastructure for network communications on the system, and is a good alternative to traditional ways and using the bus. By increasing the density of chips, the possibility of failure in the chip network increases and providing correction and fault tol...
متن کاملWeb Search for a Planet: The Google Cluster Architecture
Few Web services require as much computation per request as search engines. On average, a single query on Google reads hundreds of megabytes of data and consumes tens of billions of CPU cycles. Supporting a peak request stream of thousands of queries per second requires an infrastructure comparable in size to that of the largest supercomputer installations. Combining more than 15,000 commodity-...
متن کاملFramework for Simulation of Heterogeneous MpSoC for Design Space Exploration
Due to the ever-growing requirements in high performance data computation, multiprocessor systems have been proposed to solve the bottlenecks in uniprocessor systems. Developing efficient multiprocessor systems requires effective exploration of design choices like application scheduling, mapping, and architecture design. Also, fault tolerance in multiprocessors needs to be addressed. With the a...
متن کاملError Handling in Wormhole Networks
.........................................................................................................................3 Introduction....................................................................................................................4 Fault Model Notation and Error Control Scheme for S2S Buses on NoC [1]...............5 Analysis of Error Recovery Schemes for Networks on Chips ...
متن کامل